Word Prediction Techniques for User Adaptation and Sparse Data Mitigation
نویسندگان
چکیده
The field of Augmentative and Alternative Communication (AAC) seeks to help minimize the effects of speech disorders to allow better communication. Hightech AAC devices are electronic devices that address the dual issue of speech impairment and reduced motor control. Communication rate can be improved substantially using word prediction, an application of language modeling to text entry. Word prediction relies on the letters and words that the user has entered so far to suggest likely words that the user is in the process of typing. Although we can increase language model quality with more training data, these efforts are unlikely to increase the average relevance of training texts. Instead, many documents will be dissimilar to testing data. Even if a few relevant texts are added to the training data, their contribution is marginalized by the abundance of irrelevant texts. We address the problem of varying relevance through adaptive language modeling, specifically topic adaptation and style adaptation. We found that these adaptations increase keystroke savings for both topic and style adaptation individually, and also when topic and style modeling are combined. We have addressed the problem of irrelevant training data with cache models to learn new words and model lexical repetition, and we also integrated a large word list using methods developed for the part of speech ngram model. We have focused on general-purpose improvements in language modeling and natural language processing. Many of our methods may be applicable to related language modeling problems. However, we have focused only on language modeling improvements which we feel are well-suited for word prediction and AAC.
منابع مشابه
Word Prediction Techniques for User Adaptation and Sparse Data Mitigation Ph.D. Thesis Proposal
In the United States alone, it is estimated that approximately two million people suffer from a speech disability severe enough to create a difficulty in being understood [9]. This affects individuals in all aspects of their lives, from education to the workplace and in their personal lives. High-tech AAC devices are electronic devices that take letter and word input and produce speech output. ...
متن کاملA New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملA NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM
Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...
متن کاملIntegrating Climate Change Adaptation and Mitigation with Urban planning for a Livable city in Tehran
Climate change impacts are seen within growing numbers of cities in low- and middle-income countries, so there is growing interest in the adaptation and mitigation plans and programs put forward by city authorities. This paper aims to provide a better understanding of the constraints which cities face them in this subject by analyzing the case of Tehran. City has a commitment to decentralizatio...
متن کاملRelevance vector machine and multivariate adaptive regression spline for modelling ultimate capacity of pile foundation
This study examines the capability of the Relevance Vector Machine (RVM) and Multivariate Adaptive Regression Spline (MARS) for prediction of ultimate capacity of driven piles and drilled shafts. RVM is a sparse method for training generalized linear models, while MARS technique is basically an adaptive piece-wise regression approach. In this paper, pile capacity prediction models are developed...
متن کامل